Goto

Collaborating Authors

 ground-truth label


Shortcut Features as Top Eigenfunctions of NTK: ALinear Neural Network Case and More

Neural Information Processing Systems

One of the chronic problems of deep-learning models is shortcut learning. In a case where the majority of training data are dominated by a certain feature, neural networks prefer to learn such a feature even if the feature is not generalizable outside the training set. Based on the framework of Neural Tangent Kernel (NTK), we analyzed the case of linear neural networks to derive some important properties of shortcut learning. We defined a "feature" of a neural network as an eigenfunction of NTK. Then, we found that shortcut features correspond to features with larger eigenvalues when the shortcuts stem from the imbalanced number of samples in the clustered distribution. We also showed that the features with larger eigenvalues still have a large influence on the neural network output even after training, due to data variances in the clusters. Such a preference for certain features remains even when a margin of a neural network output is controlled, which shows that the max-margin bias is not the only major reason for shortcut learning. These properties of linear neural networks are empirically extended for more complex neural networks as a two-layer fully-connected ReLU network and a ResNet-18.


Partial Multi-Label Learning with Probabilistic Graphical Disambiguation

Neural Information Processing Systems

In partial multi-label learning (PML), each training example is associated with a set of candidate labels, among which only some labels are valid. As a common strategy to tackle PML problem, disambiguation aims to recover the ground-truth labeling information from such inaccurate annotations. However, existing approaches mainly rely on heuristics or ad-hoc rules to disambiguate candidate labels, which may not be universal enough in complicated real-world scenarios. To provide a principled way for disambiguation, we make a first attempt to explore the probabilistic graphical model for PML problem, where a directed graph is tailored to infer latent ground-truth labeling information from the generative process of partial multi-label data. Under the framework of stochastic gradient variational Bayes, a unified variational lower bound is derived for this graphical model, which is further relaxed probabilistically so that the desired prediction model can be induced with simultaneously identified ground-truth labeling information. Comprehensive experiments on multiple synthetic and real-world data sets show that our approach outperforms the state-of-the-art counterparts.



FUSE: Ensembling Verifiers with Zero Labeled Data

arXiv.org Machine Learning

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervised Score Ensembling (FUSE), a method for improving verification quality by ensembling verifiers without access to ground truth correctness labels. The key idea behind FUSE is to control conditional dependencies between verifiers in a manner that improves the unsupervised performance of a class of spectral algorithms from the ensembling literature. Despite requiring zero ground truth labels, FUSE typically matches or improves upon semi-supervised alternatives in test-time scaling experiments with diverse sets of generator models, verifiers, and benchmarks. In particular, we validate our method on both conventional academic benchmarks such as GPQA Diamond and on frontier, unsaturated benchmarks such as Humanity's Last Exam and IMO Shortlist questions.